Skip to content

Add support for Lookup Join on Multiple Fields #131559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

julian-elastic
Copy link
Contributor

@julian-elastic julian-elastic commented Jul 18, 2025

Add support for Lookup Join on Multiple Fields

FROM index1
| LOOKUP JOIN lookup_index on field1, field2

Removed some checks to allow lookup join on multiple fields.
Added a new interface LookupEnrichQueryGenerator, that can be used to get total number of queries and queries by position. The rest of the methods from QueryGenerator are not needed by AbstractLookupService.

That allowed the creation of a new class ExpressionQueryList implements LookupEnrichQueryGenerator, which is responsible for creating the AND query for the different fields. We will likely need to enhance it in the future to support expressions that include OR and NOT as well.

TransportRequest is enhanced to now support List<MatchConfig> matchFields instead of String matchField. This is how we pass the match fields around now. If we are communicating with an cluster that does not support LookupOnMultipleFields and it is needed by the query we will fail the query. This can happen during rolling upgrade or CCS.

@julian-elastic julian-elastic added :Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >enhancement labels Jul 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @julian-elastic, I've created a changelog YAML for you.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass, reviewed the whole non-productive code now. Looks good so far, although I have some ideas for more tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I wonder what happens when users ask us to join on an unreasonable amount of fields:

| LOOKUP JOIN lu_idx ON field1, field2, ..., field999

We should circuit break in this case, but there is a chance we might OOM if we messed up our memory accounting somewhere.

We could add a test to HeapAttackIT in a follow-up PR, if you agree.

Copy link
Contributor Author

@julian-elastic julian-elastic Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of join on fields is limited by the number of fields in the lookup index, so it cannot be unlimited. I have some code in x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java where we throw an exception if you try to join twice on the same field.
"JOIN ON clause does not support multiple fields with the same name, found multiple instances of [{}]", so you cannot repeat them either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can create indices with many, many, many fields. The ECS schema has north of 1k fields.

With malicious intent, a user could try to bring down a cluster by issuing a join that joins on all the lookup index' fields. I think it'd be worth it to test this for good measure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly make a HeapAttackIT where we try to crash the world and catch the circuit breaker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 cases to HeapAttackIT.
One is a working case with 990 join on fields. If I try to have more than 1000 we get an [sensor_data] java.lang.IllegalArgumentException: Limit of total fields [1000] has been exceeded
The other one is a circuit breaker case with multiple fields, but they cannot be too many as there is a lot of data to insert due to all the fields, and then the insert takes a long time and the test can time out.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I now got everything except some of the compute engine-only code. Looks good, but I think we can clean up the changed (transport) request a little more as it sends redundant data now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to declare this change as notable. I'm not sure what the bar is for that @leemthompo ?

Copy link
Contributor

@leemthompo leemthompo Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you and @tylerperk agree sounds like this passes the notable bar 😄 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leemthompo How do I declare this change as notable? Can you point me to an example? Or you add it to some other release notes list after it is merged?

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my side. Left a few small things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly make a HeapAttackIT where we try to crash the world and catch the circuit breaker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good choice.

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs updates LGTM, thanks Julian

@julian-elastic julian-elastic marked this pull request as draft July 31, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants